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Abstract. - We consider a population of genotype sequences evolving on a rugged fitness landscape 
with many local fitness peaks. The population walks uphill until it encounters a local fitness 
maximum. We find that the statistical properties of the walk length depend on whether the 
underlying fitness distribution has a finite mean. If the mean is finite, all the walk length cumulants 
grow with the sequence length but approach a constant otherwise. Experimental implications of 
our analytical results are also discussed. 
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Introduction. — The evolutionary process of adapta- 
'■^tion is common in nature [T] and during the last decades, 
Qrt|he dynamics of adaptation have been studied in several 
'~~bxperiments on microbial populations P]- The nature of 
tlie adaptive process depends crucially on the availability 
^ of beneficial mutations that improve the fitness [3] . If such 
mutations are readily available as in populations of very 
large size, the dynamics are well described by a determin- 
istic theory [4] while for moderately large populations, a 
. stochastic theory which accounts for competing multiple 
inutations can be applied [5]. Here we work in the pa- 
^^rameter regime where beneficial mutations are rare and 
I a population of genotype sequences performs an adaptive 
'' walk on a fitness landscape [6l[2] ■ 

' More precisely, the adaptive walk model assumes that 
the number of mutants produced per generation is small so 
C^that the population is genetically homogeneous and may 
be represented by a single particle. The weak mutation as- 
sumption also renders the sequences differing by more than 
one mutation inaccessible. Furthermore the sequences car- 
rying mutations that decrease the fitness do not survive 
and hence the adaptive walker always walks uphill. On 
a rugged fitness landscape with many local optima, the 
walk ends when a local fitness maximum is encountered 
since a better fitness is at least two mutations away as 
illustrated in Fig. [1] Remarkably, under these assump- 
tions, the model depends only on a small set of parameters 
namely the sequence length and the fitness distribution 
underlying the fitness landscape. Recently some theoreti- 
cal predictions for the first step |S] in the walk were tested 
in an experiment on a ssDNA virus [3] and a reasonable 



agreement between theory and experiment was found. As 
the adaptive walk describes a simple and biologically real- 
istic model of adaptation, it is important to analyse it in 
detail to extend our present understanding of adaptation 
dynamics. 

In this Letter, we focus on the statistical properties of 
the length of adaptive walk defined as the number of bene- 
ficial mutations accumulated until the population reaches 
a local fitness maximum. Recently the walk length dis- 
tribution was calculated within an approximation for the 
model described above [TU] and the mean walk length was 
computed exactly in a simplified version of the adaptive 
walk [11]. However these studies assume that the fitness 
distribution has a finite mean. Here we relax this assump- 
tion and interestingly, we find that in the limit of infinitely 
long sequence, there is a transition in the behavior of the 
walk length distribution: it vanishes for fitness distribu- 
tions with finite mean but remains finite otherwise. For 
finite sequences, this result implies that the walk length 
diverges with the sequence length for distributions with 
finite mean. For such distributions, we show that all the 
walk length cumulants grow logarithmically with the se- 
quence length and find the proportionality constant for 
the first few cumulants. Our analytical results are com- 
pared with the numerical results and their experimental 
implications are also discussed. 

ModeL — We work with binary sequences of length L 
so that each sequence has L neighbors which are one mu- 
tation away. As the fitness always increases in an adap- 
tive walk (see Fig. [T]), the mutants that lower the current 
fitness h of the walker are rejected and a mutant with 
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Fig. 1: (Color online) Schematic diagram to illustrate adap- 
tive walk on a rugged fitness landscape with many local max- 
ima. The population (filled circle) with fitness h has fitter 
one-mutant neighbors with fitness /,/',.■•■ One of the better 
mutants is chosen with a transition probability fl]). The global 
maximum is not accessible to the population as it is not a one- 
mutant neighbor and the walk terminates when the population 
reaches a local fitness maximum. 



given fitness f > h is ehosen with a transition probability 
T{f,h\f) proportional to the fitness difference f — h [J. 
Tlrus the normalised transition probability is given by 



T{fMf) = 



f-h 



(1) 



where the fitnesses are independent random variables cho- 
sen from a common distribution p{f) witfi support on tlie 
interval [0,u]. Following previous works [TTjfT^ . we choose 
the fitnesses from a generalised Parcto distribution defined 

as 

p{f) = {l + ^f)-'^ (2) 

where the fitness / is unbounded for k > and / < — 1/k 
for K < 0. The distribution of the beneficial mutations is 
however governed by the upper tail of the fitness distribu- 
tion p{f) [7] and hence can be one of the three universal 
distributions only [T3j[T4]. The fitness distribution p{f) 
lies in the domain of the extreme value distribution given 
by WeibuU distribution for k < 0, Gumbel distribution if 
K — > and Frechet distribution for k > 0. Although much 
of the experimental data on distribution of beneficial mu- 
tations is consistent with k -> [9j[T5], recent works also 
support K < [TB] and k > [TT] . 

The adaptive walk in the limits k — >■ ±oo is well studied 
theoretically. When k ^ oo, the adaptive walk model re- 
duces to a greedy walk [T2] for which the walk length dis- 
tribution is finite for infinitely long sequences |17| while 
for K — !> — oo, a random adaptive walk is obtained |12j 
for which the walk length distribution is a Poisson distri- 
bution with mean InL |18j . Recently the adaptive walk 
model described above was studied in detail for k = — 1 
and K — )■ and the walk length distribution was computed 



[TO] . Here we are interested in the properties of adaptive 
walk when k is arbitrary but finite. 

Following [18], we consider the conditional probability 
Vj{f) that the walker takes at least J steps and has a 
fitness / at the Jth step given that the initial fitness is 
/q. For long sequences, one can write down the following 
recursion relation for J > [TO] : 



dh Lpif)T{f, h\f) (1 - q'^{h))Pj{h, L) (3) 



where q{h) = /q dg p{g). The above equation expresses 

the fact that the walker can proceed to the next step if at 

least one fitness value greater than the current fitness h 

is available which occurs with a probability 1 — q^{h) k, 
_ i_ 

1 - 6-^(1+'^'') " . The walk length distribution Qj that 
exactly J steps are taken is related to T'j{f) according to 
the following relation [TO]: 



Qj{L)= / dhq''{h)Vj{h,L) 
Jo 



(4) 



This is because in order to terminate the walk at the Jth 
step, none of the L mutant fitnesses at the next step should 
exceed the fitness at step J. In the following, we set the 
initial fitness /o to be zero, "Pol/i L) — 6{f) which ensures 
that the walker does not start at a local fitness maximum. 

Transition in the behavior of walk length. Us- 
ing a scaling analysis and extreme value theory, we now 
show that the qualitative behavior of walk length distri- 
bution Qj changes at k = 1. We find that the walk length 
distribution vanishes for k < 1 but remains finite for k > 1 
as L — >■ oo. We note that the behavior of Qj discussed 
above for k — >■ ±00 is in accordance with our result. 

For K < 1, it is a good approximation to replace the 
sum on the right hand side (RHS) of ([T]) by the integral 
L dg{g — h)p{g) when L is large [TO]. Then in the limit 
L — > cxD, the recursion equation ([3]) reduces to 



7^j+i(/) = (l-^) 



{1 + kK) — 



(5) 



where Pj{f) = Vj{f,L 00). A generating function for 
the distribution Pj{f) can be calculated (see ([TO])) which 
shows that Pj{f) is finite. Thus from it immediately 
follows that Qj{L) — >■ as L — 00 for all J. Our numeri- 
cal results in Fig. [2] for k, ~ 1/2 show that for J > 4, the 
distribution Qj{L = 10"*) < Qj{L = 10^) and for J < 4, 
Qj{L = 10^) < Qj{L = 10''). Thus the distribution Qj 
decreases with increasing L. 

For K > 1, the sum in ([1]) can not be replaced by an 
integral as the mean of the distribution is infinite. For such 
fat-tailed distributions, the sum of L random variables is 
dominated by the largest value / amongst them [TOlfTl] . If 
at most one fitness exceeds /, we have L(l — q{f)) ~ 1 or 
1 + nf ^ for any k. Using this result in the recursion 
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shows that the distribution Pj{f,L) obeys the following 
equation [TU] : 



Pj+iif,L) 



If dg{g - f)p{g) 



Pjif,L),J>l (9) 



where prime denotes a /-derivative. The boundary condi- 
tions are given by |10[ 



lo dg g p{g) 



(10) 



J 

Fig. 2: Walk length distribution Qj as a function of J for k = 
1/2 (main) and 3/2 (inset) for L — 10^ (squares), 10'' (circles) 
and IC (diamonds) to show that in the infinite sequence length 
limit, Q,j vanishes for k < 1 but remains finite for k > 1. 
The points are joined to guide the eye. For comparison, the 
analytical result for the random adaptive walk (main) is shown 
for L — 500 (broken line) and 10"^ (solid line) and for the greedy 
walk (inset) for infinitely long sequence. 



As ^ is non-diagonal in J, we work with a generating 
function G[x, /) = X^jLi P.iil)^'^ < 1 which obeys the 
following second order differential equation: 



G"(../) = ^gffiil^G(../) 

jf dg{g - f)p{g) 



(11) 



The above differential equation does not appear to be 
exactly solvable due to the factor 1 — q^{f) on the RHS. 
As this cumulative probability decreases from one to zero 
with increasing /, we consider (jlip by approximating 



i-g^(/) = i 



equation ^ and changing the variable to z 
we find that for k > 1. 



(l+K/)/L^ 



1 , / < / (12a) 
r(/) , / > /(12b) 



Vj+i{z,L) cx 



dy{z~y)z - {I - e" 



where 1 + k/ = L'' as found earlier. Equation PT|) has 
been solved by choosing r(/) = in [TU] for k = — 1 and 
)'Pj{yiP) (6)k — >■ 0. Here we show that the leading order behavior of 
the cumulants does not depend on the choice of r(/). For 



where the proportionality constant depends on k and is / < ./i ^ result of p2ap, we have 
omitted for brevity. Since the distribution Vi{f,L) for 
large L is writeable as 



(IT 



G< 



(13) 



1 fl + nf 



-1/k 



K - 1 



-1/k 



(7) 



whose solution is of the form G< 
nfY"- where 



a+(l + K/)«++a_(l + 



it follows that for large L, the fitness distribution at the 
Jth step of adaptive walk is of the following scaling form: 



a±{x) 



1±a/1 



(14) 



l + «/ 



(8) 



and the constants a± can be determined using the bound- 
ary conditions ([TU| to finally yield 



where S,j{z) is a scaling function. Using this scaling form 
in (01) and taking the limit i — > cxd, we immediately find 

that Qj ~ (1/k) dz Sj{z) " is finite in agreement 
with the numerical results shown in the inset of Fig. [5] 



a:;(l — k) 



+4a;(l - k) 



[{l + ^fr^ -{l + ^fr-] (15) 



For / > /, using (|T2bl) in (HI]), we get 
.t(1 - K)r{f) 



Walk length cumulants for fitness distributions 
with finite mean. — For k < 1, the probability that the ^> — 

walk terminates at the Jth step is zero or in other words, 

the walk goes on indefinitely for infinitely long sequences ^^^^^ ^^^^^.^^ ^^^^ 
and hence the mean number of adaptive steps diverges 
with L. We now show that all the walk length cumulants 
increase logarithmically with L. 

On differentiating ^ twice with respect to / and writ 



G^ 



hgiixj) + 6252(2;,/) 



(16) 



(17) 



where the functions gi , 52 obey (|16p and 61 , 62 are con- 
ing 'Pj{f, L) = p{f)Pj{f, L), a straightforward calculation slants. In order to compute the walk length cumulants for 
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large L, it is sufBcicnt to find the L dependence of 61, &2- 
This can be done by matching the solutions G< and G> 
and their first derivative at / = / and we find 

bi(x,L) = L«"+6n(x) + L'^"-5i2(x) (18) 

b2{x,L) - L''"+62l(2:) + L""-622(x) (19) 

where bij{x) are independent of L. 

We now use (fTS)) and (flT)) to write down an expression 
for the generating function H{x,L) = X^jLi QjI^)^;'^ of 
the walk length distribution. Using (|12ap in (g]), we have 



H{x,L) 



J f 



dh (l~r{h)) p{h) G>{x,h) (20) 



^ z-^{l-r{z))Gy{x,z)i21) 



L'"'+-^Ti{x) + L" 



'T2{x) 



(22) 



where the integral 

Ti(x) oc y z » (1 - r(z)) (6ii.gi(z) + 52ig2(2)) 

is independent of L which can be seen using the upper 
bounds namely u = — l/KforK<0 and infinity for k > 0. 
Since Ka± — 1 < for any k, on taking the limit L ^ 00 
in (im, we find that the walk length distribution vanishes 
as discussed earlier. 

The fact that Ti{x) is independent of L leads to a con- 
siderable simplification of the problem and allows us to 
find the cumulants to leading order in sequence length. 
The nth cumulant is defined as iMl 



c„(i) 



(r^\nH{x,L) 



(23) 



s=0 



where s = Inx. As the first term on the RHS of 
decays less rapidly than the second term for any k, we 
have H{x) « L'^°'+^^Ti{x). Using this, we immediately 
obtain the cumulants to leading order in L as 



e d" 

2ds" 



y/K^+Ae''{l - k) 



, n > (24) 



s=0 



where £ = InL. Thus we find that all the walk length 
cumulants increase logarithmically with L. The first three 
cumulants computed using the last expression are given by 



1 



Cl 



C2 



C3 



2 - K 

(1-k)(2-2k + k2) 



(1 



(2 - k)3 
k)(4-8k 



+ - 2k^ + k^) 



(2-^) 



(25) 
(26) 

e (27) 



In the limit k, — > —00, all the above cumulants are equal 
to In L in agreement with the results for random adaptive 




Fig. 3: Plot of the first three cumulants as a function of se- 
quence length L for fitness distributions p{f) — 1 (squares), 
e^-^ (circles) and (1 4- 0.5/)^'' (diamonds). The main figure 
shows the simulation data for mean ci (open symbols) and 
variance C2 (filled symbols) and the inset shows the third cu- 
mulant C3. The slope of the solid lines is given by the analytical 
results in (|25|l - (|27p . The numerical data has been averaged over 
10^ independent realisations of fitnesses and the data for C2 has 
been shifted by a constant for clarity. 



walk [18]. We also recover the previous results for uni- 
formly and exponentially distributed fitnesses |10j . Equa- 
tions and (pS)) also match the results of [TT] in which 
a fixed set of mutants during the entire walk is assumed. 
In contrast, we have considered a more realistic mutation 
scheme in which a novel set of mutants are available to 
the population at each adaptive step. The above expres- 
sions for Cl and C2 have also been seen in a deterministic 
model of evolution [TO] and a relationship of this model to 
adaptive walks has been recently elucidated [TT]. Figure [3] 
shows that our expressions ([25 |) - (j27p agree very well with 
the numerical results. 

Discussion. — In this article, we studied a biolog- 
ically realistic model of adaptation and showed that to 
leading orders in L, the average walk length is a constant 
for fitness distributions with infinite mean but increases 
logarithmically with the sequence length otherwise. Our 
analytical results agree well with the numerical simula- 
tions. 

Our broad theoretical result that the adaptive walks 
are short (see Fig. [3]) is consistent with the experiments 
on microbes [5] and fungus [20] in which 2 — 6 adaptive 
substitutions have been observed. However more detailed 
experimental studies are needed to test our predictions. 
Our result ((25)) shows that the walk should last longer in 
systems with smaller k. This may be checked by mea- 
suring the mean walk length in populations with k = —1 
[TB], K -J' [in] and K > [TT]- To find the dependence of 
walk length properties on L, varying the sequence length 
may not be experimentally viable but it should be possi- 
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ble to set up experiments along the lines of [5] and vary 
the initial fitness rank. If the initial ranks are of the order 
L, we expect our analysis to hold [5]. Experimental data 
for the walk length distribution showing insensitivity to 
the initial rank would then imply an underlying fat-tailed 
fitness distribution with infinite mean. 
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